Skip to content

Conversation

@coolkp
Copy link
Collaborator

@coolkp coolkp commented Oct 22, 2025

This pr adds

  1. Add optimizer state to checkpoint (save checkpoint path, whole state saved)
  2. Load optimizer and step, overwrite optimizer in training loop (be careful of shardings) its inherited.
  3. resume training from last checkpointed step

@github-actions
Copy link

@coolkp coolkp merged commit 662d501 into main Oct 22, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants